42 research outputs found

    TweetLID : a benchmark for tweet language identification

    Get PDF
    Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (1) distinction of similar languages, (2) detection of multilingualism in a single document, and (3) identifying the language of short texts. In this paper, we describe our work on the development of a benchmark to encourage further research in these three directions, set forth an evaluation framework suitable for the task, and make a dataset of annotated tweets publicly available for research purposes. We also describe the shared task we organized to validate and assess the evaluation framework and dataset with systems submitted by seven different participants, and analyze the performance of these systems. The evaluation of the results submitted by the participants of the shared task helped us shed some light on the shortcomings of state-of-the-art language identification systems, and gives insight into the extent to which the brevity, multilingualism, and language similarity found in texts exacerbate the performance of language identifiers. Our dataset with nearly 35,000 tweets and the evaluation framework provide researchers and practitioners with suitable resources to further study the aforementioned issues on language identification within a common setting that enables to compare results with one another

    Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially

    Get PDF
    In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis

    Relativistic Binaries in Globular Clusters

    Get PDF
    Galactic globular clusters are old, dense star systems typically containing 10\super{4}--10\super{7} stars. As an old population of stars, globular clusters contain many collapsed and degenerate objects. As a dense population of stars, globular clusters are the scene of many interesting close dynamical interactions between stars. These dynamical interactions can alter the evolution of individual stars and can produce tight binary systems containing one or two compact objects. In this review, we discuss theoretical models of globular cluster evolution and binary evolution, techniques for simulating this evolution that leads to relativistic binaries, and current and possible future observational evidence for this population. Our discussion of globular cluster evolution will focus on the processes that boost the production of hard binary systems and the subsequent interaction of these binaries that can alter the properties of both bodies and can lead to exotic objects. Direct {\it N}-body integrations and Fokker--Planck simulations of the evolution of globular clusters that incorporate tidal interactions and lead to predictions of relativistic binary populations are also discussed. We discuss the current observational evidence for cataclysmic variables, millisecond pulsars, and low-mass X-ray binaries as well as possible future detection of relativistic binaries with gravitational radiation.Comment: 88 pages, 13 figures. Submitted update of Living Reviews articl

    Hand osteoarthritis: clinical phenotypes, molecular mechanisms and disease management

    Get PDF
    Osteoarthritis (OA) is a highly prevalent condition and the hand is the most commonly affected site. Patients with hand OA frequently report symptoms of pain, functional limitations, and frustration in undertaking everyday activities. The condition presents clinically with changes to the bone, ligaments, cartilage and synovial tissue, which can be observed using radiography, ultrasonography or MRI. Hand OA is a heterogeneous disorder and is considered to be multifactorial in aetiology. This review provides an overview of the epidemiology, presentation and burden of hand OA, including an update on hand OA imaging (including the development of novel techniques), disease mechanisms and management. In particular, areas for which new evidence has substantially changed the way we understand, consider and treat hand OA are highlighted. For example, genetic studies, clinical trials and careful prospective imaging studies from the past 5 years are beginning to provide insights into the pathogenesis of hand OA that might uncover new therapeutic targets in disease

    The Role of Indigenous Languages in Schools: The Case of Sarawak

    No full text
    This chapter describes the role of indigenous languages in Sarawak schools, beginning with a brief background on the diversity of languages and indigenous language use patterns in the state. This is followed by a description of efforts to preserve and promote the formal learning of indigenous languages in various indigenous communities, with a special focus on the Bidayuh and Iban communities whose languages have been used for formal education. Efforts to preserve Sarawak indigenous languages in the early twentieth century took the form of producing orthography for the language. The Iban language has been standardised and offered as a school subject but it is more difficult for Bidayuh to become a school subject due to the regional variations in Bidayuh isolects. In recent years, Bidayuh has been introduced as a medium of instruction in some preschools run by the Dayak National Bidayuh Association. The other Sarawak indigenous languages have some written materials in their languages but they are far from integrating into the Malaysian national curriculum. The initial effort in this direction has to come from the indigenous communities but research has shown that belief in the heritage value of indigenous languages alone is not sufficient to mobilise community literacy activities on a long-term basis
    corecore